DinTucker: Scaling Up Gaussian Process Models on Large Multidimensional Arrays

نویسندگان

Shandian Zhe

Yuan Qi

Youngja Park

Zenglin Xu

Ian Molloy

Suresh Chari

چکیده

Tensor decomposition methods are effective tools for modelling multidimensional array data (i.e., tensors). Among them, nonparametric Bayesian models, such as Infinite Tucker Decomposition (InfTucker), are more powerful than multilinear factorization approaches, including Tucker and PARAFAC, and usually achieve better predictive performance. However, they are difficult to handle massive data due to a prohibitively high training cost. To address this limitation, we propose Distributed infinite Tucker (DINTUCKER), a new hierarchical Bayesian model that enables local learning of InfTucker on subarrays and global information integration from local results. We further develop a distributed stochastic gradient descent algorithm, coupled with variational inference for model estimation. In addition, the connection between DINTUCKER and InfTucker is revealed in terms of model evidence. Experiments demonstrate that DINTUCKER maintains the predictive accuracy of InfTucker and is scalable on massive data: On multidimensional arrays with billions of elements from two real-world applications, DINTUCKER achieves significantly higher prediction accuracy with less training time, compared with the state-of-the-art large-scale tensor decomposition method, GigaTensor.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

DinTucker: Scaling up Gaussian process models on multidimensional arrays with billions of elements

Infinite Tucker Decomposition (InfTucker) and random function prior models, as nonparametric Bayesian models on infinite exchangeable arrays, are more powerful models than widely-used multilinear factorization methods including Tucker and PARAFAC decomposition, (partly) due to their capability of modeling nonlinear relationships between array elements. Despite their great predictive performance...

متن کامل

Fast Kernel Learning for Multidimensional Pattern Extrapolation

The ability to automatically discover patterns and perform extrapolation is an essential quality of intelligent systems. Kernel methods, such as Gaussian processes, have great potential for pattern extrapolation, since the kernel flexibly and interpretably controls the generalisation properties of these methods. However, automatically extrapolating large scale multidimensional patterns is in ge...

متن کامل

Scaling Multidimensional Gaussian Processes using Projected Additive Approximations

Exact Gaussian Process (GP) regression has O(N) runtime for data size N , making it intractable for large N . Advances in GP scaling have not been extended to the multidimensional input setting, despite the preponderance of multidimensional applications. This paper introduces and tests a novel method of projected additive approximation to multidimensional GPs. We illustrate the power of this me...

متن کامل

A Simple Construction of the Fractional Brownian Motion

In this work we introduce correlated random walks on Z. When picking suitably at random the coefficient of correlation, and taking the average over a large number of walks, we obtain a discrete Gaussian process, whose scaling limit is the fractional Brownian motion. We have to use two radically different models for both cases 1 2 ≤ H < 1 and 0 < H < 1 2 . This result provides an algorithm for t...

متن کامل

Bayesian inference with rescaled Gaussian process priors

Abstract: We use rescaled Gaussian processes as prior models for functional parameters in nonparametric statistical models. We show how the rate of contraction of the posterior distributions depends on the scaling factor. In particular, we exhibit rescaled Gaussian process priors yielding posteriors that contract around the true parameter at optimal convergence rates. To derive our results we e...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2016

DinTucker: Scaling Up Gaussian Process Models on Large Multidimensional Arrays

نویسندگان

چکیده

منابع مشابه

DinTucker: Scaling up Gaussian process models on multidimensional arrays with billions of elements

Fast Kernel Learning for Multidimensional Pattern Extrapolation

Scaling Multidimensional Gaussian Processes using Projected Additive Approximations

A Simple Construction of the Fractional Brownian Motion

Bayesian inference with rescaled Gaussian process priors

عنوان ژورنال:

اشتراک گذاری